Avatar

Md Rahman

AI Engineer

Cisco IT

Md Atiqur Rahman, PhD, is an AI Engineer and architect at Cisco, where he designs and leads the development of enterprise-scale generative AI platforms for secure and regulated environments. His work focuses on building cloud-agnostic, on-prem–first AI systems that enable teams across Cisco to rapidly develop Retrieval-Augmented Generation (RAG) pipelines, agentic workflows, and multimodal AI applications at enterprise scale. By architecting distributed AI infrastructure, he helps transform complex AI development processes into streamlined platforms that allow teams to build production AI systems in minutes rather than months. Rahman specializes in scalable AI infrastructure, including Kubernetes-based microservices, distributed event-driven architectures using Kafka and Redis, and high-performance retrieval systems built on vector search and hybrid ranking techniques. His work also includes building large-scale document ingestion systems for diverse enterprise data formats and integrating Vision-Language Models (VLMs) to extract knowledge from complex image-rich documents. He focuses on optimizing retrieval performance, observability, and pipeline tracing to ensure reliability across large-scale AI workloads. Rahman holds a Ph.D. in Computational & Data-Enabled Science & Engineering and is the inventor of multiple U.S. patents in machine learning and network intelligence. Before leading GenAI platform initiatives, he developed predictive machine learning systems that improved network reliability for Cisco customers. He is passionate about translating cutting-edge AI research into scalable, production-ready platforms that accelerate enterprise AI adoption.

Articles

Fine-Tuning Embedding Models for Enterprise Retrieval: A Practical Guide with NVIDIA Nemotron Recipe

5 min read

Cisco IT recently evaluated fine-tuning embedding models using NVIDIA Nemotron RAG fine-tuning recipe as part of an effort to improve retrieval accuracy for domain-specific enterprise data. The objective was not to redesign existing retrieval-augmented generation (RAG) systems, but to understand whether targeted embedding fine-tuning could materially improve semantic search quality with reasonable effort and fast turnaround. Through this experiment, Cisco was able to validate firsthand that embedding fine-tuning, combined with synthetic data generation, can deliver measurable accuracy gains within a short time frame. The experiment also demonstrated strong time-to-value, enabling rapid iteration and clear performance signals without long training cycles or extensive manual labeling. The reduced turnaround of only a few days to understand the immediate benefits was a key outcome of this collaboration. The embedding model training and evaluation workflow was executed on Cisco AI PODs running Cisco UCS 885A infrastructure powered by NVIDIA HGX platform.